{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Haiku as a sub-agent\n",
    "\n",
    "In this recipe, we'll demonstrate how to analyze Apple's 2023 financial earnings reports using Claude 3 Haiku sub-agent models to extract relevant information from earnings release PDFs. We'll then use Claude 3 Opus to generate a response to our question and create a graph using matplotlib to accompany its response."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Set up the environment\n",
    "First, let's install the required libraries and set up the Claude API client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install anthropic IPython PyMuPDF matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the required libraries\n",
    "import base64\n",
    "import io\n",
    "import os\n",
    "from concurrent.futures import ThreadPoolExecutor\n",
    "\n",
    "import fitz\n",
    "import requests\n",
    "from anthropic import Anthropic\n",
    "from PIL import Image\n",
    "\n",
    "# Set up the Claude API client\n",
    "client = Anthropic()\n",
    "MODEL_NAME = \"claude-haiku-4-5\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Gather our documents and ask a question\n",
    "For this example, we will be using all Apple's financial statements from the 2023 financial year and asking about the net sales over the year."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List of Apple's earnings release PDF URLs\n",
    "pdf_urls = [\n",
    "    \"https://www.apple.com/newsroom/pdfs/fy2023-q4/FY23_Q4_Consolidated_Financial_Statements.pdf\",\n",
    "    \"https://www.apple.com/newsroom/pdfs/fy2023-q3/FY23_Q3_Consolidated_Financial_Statements.pdf\",\n",
    "    \"https://www.apple.com/newsroom/pdfs/FY23_Q2_Consolidated_Financial_Statements.pdf\",\n",
    "    \"https://www.apple.com/newsroom/pdfs/FY23_Q1_Consolidated_Financial_Statements.pdf\",\n",
    "]\n",
    "\n",
    "# User's question\n",
    "QUESTION = \"How did Apple's net sales change quarter to quarter in the 2023 financial year and what were the key contributors to the changes?\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Download and convert PDFs to images\n",
    "Next, we'll define functions to download the earnings release PDFs and convert them to base64-encoded PNG images. We have to do this because these PDFs are full of tables that are hard to parse with traditional PDF parsers. It's easier if we just convert them to images and pass the images to Haiku.\n",
    "\n",
    "The ```download_pdf``` function downloads a PDF file from a given URL and saves it to the specified folder. The ```pdf_to_base64_pngs``` function converts a PDF to a list of base64-encoded PNG images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Function to download a PDF file from a URL and save it to a specified folder\ndef download_pdf(url, folder):\n    response = requests.get(url, timeout=60)\n    if response.status_code == 200:\n        file_name = os.path.join(folder, url.split(\"/\")[-1])\n        with open(file_name, \"wb\") as file:\n            file.write(response.content)\n        return file_name\n    else:\n        print(f\"Failed to download PDF from {url}\")\n        return None\n\n\n# Define the function to convert a PDF to a list of base64-encoded PNG images\ndef pdf_to_base64_pngs(pdf_path, quality=75, max_size=(1024, 1024)):\n    # Open the PDF file\n    doc = fitz.open(pdf_path)\n\n    base64_encoded_pngs = []\n\n    # Iterate through each page of the PDF\n    for page_num in range(doc.page_count):\n        # Load the page\n        page = doc.load_page(page_num)\n\n        # Render the page as a PNG image\n        pix = page.get_pixmap(matrix=fitz.Matrix(300 / 72, 300 / 72))\n\n        # Convert the pixmap to a PIL Image\n        image = Image.frombytes(\"RGB\", [pix.width, pix.height], pix.samples)\n\n        # Resize the image if it exceeds the maximum size\n        if image.size[0] > max_size[0] or image.size[1] > max_size[1]:\n            image.thumbnail(max_size, Image.Resampling.LANCZOS)\n\n        # Convert the PIL Image to base64-encoded PNG\n        image_data = io.BytesIO()\n        image.save(image_data, format=\"PNG\", optimize=True, quality=quality)\n        image_data.seek(0)\n        base64_encoded = base64.b64encode(image_data.getvalue()).decode(\"utf-8\")\n        base64_encoded_pngs.append(base64_encoded)\n\n    # Close the PDF document\n    doc.close()\n\n    return base64_encoded_pngs\n\n\n# Folder to save the downloaded PDFs\nfolder = \"../images/using_sub_agents\"\n\n\n# Create the directory if it doesn't exist\nos.makedirs(folder)\n\n# Download the PDFs concurrently\nwith ThreadPoolExecutor() as executor:\n    pdf_paths = list(executor.map(download_pdf, pdf_urls, [folder] * len(pdf_urls)))\n\n# Remove any None values (failed downloads) from pdf_paths\npdf_paths = [path for path in pdf_paths if path is not None]"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use ThreadPoolExecutor to download the PDFs concurrently and store the file paths in pdf_paths."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Generate a specific prompt for Haiku using Opus\n",
    "Let's use Opus as an orchestrator and have it write a specific prompt for each Haiku sub-agent based on the user provided question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extract the following information from the Apple earnings report PDF for the quarter:\n",
      "1. Apple's net sales for the quarter\n",
      "2. Quarter-over-quarter change in net sales\n",
      "3. Key product categories, services, or regions that contributed significantly to the change in net sales\n",
      "4. Any explanations provided for the changes in net sales\n",
      "\n",
      "Organize the extracted information in a clear, concise format focusing on the key data points and insights related to the change in net sales for the quarter.\n"
     ]
    }
   ],
   "source": [
    "def generate_haiku_prompt(question):\n",
    "    messages = [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": f\"Based on the following question, please generate a specific prompt for an LLM sub-agent to extract relevant information from an earning's report PDF. Each sub-agent only has access to a single quarter's earnings report. Output only the prompt and nothing else.\\n\\nQuestion: {question}\",\n",
    "                }\n",
    "            ],\n",
    "        }\n",
    "    ]\n",
    "\n",
    "    response = client.messages.create(model=\"claude-opus-4-1\", max_tokens=2048, messages=messages)\n",
    "\n",
    "    return response.content[0].text\n",
    "\n",
    "\n",
    "haiku_prompt = generate_haiku_prompt(QUESTION)\n",
    "print(haiku_prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Extract information from PDFs\n",
    "Now, let's define our question and extract information from the PDFs using sub-agent Haiku models. We format the information from each model into a neatly defined set of XML tags."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<info quarter=\"Q4\">According to the condensed consolidated statements of operations, Apple's net sales changed as follows in the 2023 financial year:\n",
      "\n",
      "Quarter Ended September 30, 2023:\n",
      "- Total net sales were $89,498 million, up from $90,146 million in the prior year quarter.\n",
      "- Product sales were $67,184 million and services sales were $22,314 million.\n",
      "\n",
      "Key contributors to the changes:\n",
      "- Product sales decreased from $70,958 million in the prior year quarter.\n",
      "- Services sales increased from $19,188 million in the prior year quarter.\n",
      "\n",
      "Overall, Apple's total net sales decreased slightly compared to the same quarter in the prior year, driven by a decline in product sales which was partially offset by growth in services sales.</info>\n",
      "<info quarter=\"Q3\">Based on the financial statements provided, Apple's net sales changed as follows between the quarters in the 2023 financial year:\n",
      "\n",
      "- Net sales increased from $81,797 million in the three months ended July 1, 2023 to $82,959 million in the three months ended June 25, 2022. This represents a quarter-over-quarter increase of around $1,162 million.\n",
      "\n",
      "The key contributors to this increase appear to be:\n",
      "\n",
      "- Products sales increased from $60,584 million to $63,355 million, an increase of around $2,771 million.\n",
      "- Services sales increased from $21,213 million to $19,604 million, an increase of around $1,609 million.\n",
      "\n",
      "So the growth in both product sales and services sales contributed to the overall increase in Apple's net sales between these two quarters in the 2023 fiscal year.</info>\n",
      "<info quarter=\"Q2\">According to the financial statements, Apple's net sales changed significantly between the quarters in the 2023 financial year:\n",
      "\n",
      "- For the three months ended April 1, 2023, net sales were $94,836 million.\n",
      "- For the six months ended March 26, 2022, net sales were $97,278 million.\n",
      "\n",
      "This shows that net sales decreased by around $2,442 million from the prior six-month period to the most recent three-month period.\n",
      "\n",
      "The key contributors to this change in net sales appear to be:\n",
      "\n",
      "1. Products sales decreased from $107,560 million in the prior six-month period to $73,929 million in the recent three-month period.\n",
      "2. Services sales increased from $41,673 million in the prior six-month period to $20,907 million in the recent three-month period.\n",
      "\n",
      "So the decrease in product sales was the primary driver behind the overall decline in net sales from the prior period to the most recent quarter.</info>\n",
      "<info quarter=\"Q1\">According to the condensed consolidated statements of operations, Apple's net sales increased from $117,154 million in the three months ended December 31, 2022 to $123,945 million in the three months ended December 25, 2021. This represents a quarter-over-quarter increase in net sales.\n",
      "\n",
      "The key contributors to this change were:\n",
      "\n",
      "1. Increase in product sales revenue from $96,388 million to $104,429 million.\n",
      "2. Increase in services revenue from $20,766 million to $19,516 million.\n",
      "\n",
      "So the overall increase in net sales was driven primarily by higher product sales, with services revenue also showing a smaller increase quarter-over-quarter.</info>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def extract_info(pdf_path, haiku_prompt):\n",
    "    base64_encoded_pngs = pdf_to_base64_pngs(pdf_path)\n",
    "\n",
    "    messages = [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                *[\n",
    "                    {\n",
    "                        \"type\": \"image\",\n",
    "                        \"source\": {\n",
    "                            \"type\": \"base64\",\n",
    "                            \"media_type\": \"image/png\",\n",
    "                            \"data\": base64_encoded_png,\n",
    "                        },\n",
    "                    }\n",
    "                    for base64_encoded_png in base64_encoded_pngs\n",
    "                ],\n",
    "                {\"type\": \"text\", \"text\": haiku_prompt},\n",
    "            ],\n",
    "        }\n",
    "    ]\n",
    "\n",
    "    response = client.messages.create(model=\"claude-haiku-4-5\", max_tokens=2048, messages=messages)\n",
    "\n",
    "    return response.content[0].text, pdf_path\n",
    "\n",
    "\n",
    "def process_pdf(pdf_path):\n",
    "    return extract_info(pdf_path, haiku_prompt)\n",
    "\n",
    "\n",
    "# Process the PDFs concurrently with Haiku sub-agent models\n",
    "with ThreadPoolExecutor() as executor:\n",
    "    extracted_info_list = list(executor.map(process_pdf, pdf_paths))\n",
    "\n",
    "extracted_info = \"\"\n",
    "# Display the extracted information from each model call\n",
    "for info in extracted_info_list:\n",
    "    extracted_info += (\n",
    "        '<info quarter=\"' + info[1].split(\"/\")[-1].split(\"_\")[1] + '\">' + info[0] + \"</info>\\n\"\n",
    "    )\n",
    "print(extracted_info)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We extract information from the PDFs concurrently using sub-agent models and combine the extracted information. We then prepare the messages for the powerful model, including the question and the extracted information, and ask it to generate a response and matplotlib code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6: Pass the information to Opus to generate a response\n",
    "Now that we have fetched the information from each PDF using the sub-agents, let's call Opus to actually answer the question and write code to create a graph to accompany the answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generated Response:\n",
      "Based on the extracted information from Apple's earnings releases, Apple's net sales changed as follows in the 2023 financial year:\n",
      "\n",
      "In Q1, net sales increased from $117,154 million in the previous quarter to $123,945 million, driven by increases in both product sales and services revenue.\n",
      "\n",
      "In Q2, net sales decreased by around $2,442 million compared to the prior six-month period, primarily due to a decrease in product sales, which was partially offset by an increase in services sales.\n",
      "\n",
      "In Q3, net sales increased by approximately $1,162 million compared to the previous quarter, with growth in both product sales and services sales contributing to the overall increase.\n",
      "\n",
      "In Q4, total net sales decreased slightly compared to the same quarter in the prior year, driven by a decline in product sales, which was partially offset by growth in services sales.\n",
      "\n",
      "Here's a Python code snippet using the matplotlib library to visualize the quarterly net sales data:\n",
      "\n",
      "<code>\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "quarters = ['Q1', 'Q2', 'Q3', 'Q4']\n",
      "net_sales = [123945, 94836, 82959, 89498]\n",
      "\n",
      "plt.figure(figsize=(8, 6))\n",
      "plt.bar(quarters, net_sales, color='skyblue', width=0.6)\n",
      "plt.xlabel('Quarter')\n",
      "plt.ylabel('Net Sales (in millions)')\n",
      "plt.title('Apple Net Sales by Quarter - FY 2023')\n",
      "plt.xticks(quarters)\n",
      "plt.yticks([0, 20000, 40000, 60000, 80000, 100000, 120000])\n",
      "plt.grid(True, linestyle='--', alpha=0.5)\n",
      "\n",
      "for i, v in enumerate(net_sales):\n",
      "    plt.text(i, v + 1000, f'${v:,.0f}', ha='center', fontsize=10)\n",
      "\n",
      "plt.show()\n",
      "</code>\n",
      "\n",
      "This code creates a bar chart showing Apple's net sales for each quarter in the 2023 financial year. The x-axis represents the quarters, and the y-axis represents the net sales in millions of dollars. The chart also includes data labels showing the exact net sales values for each quarter.\n"
     ]
    }
   ],
   "source": [
    "# Prepare the messages for the powerful model\n",
    "messages = [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [\n",
    "            {\n",
    "                \"type\": \"text\",\n",
    "                \"text\": f\"Based on the following extracted information from Apple's earnings releases, please provide a response to the question: {QUESTION}\\n\\nAlso, please generate Python code using the matplotlib library to accompany your response. Enclose the code within <code> tags.\\n\\nExtracted Information:\\n{extracted_info}\",\n",
    "            }\n",
    "        ],\n",
    "    }\n",
    "]\n",
    "\n",
    "# Generate the matplotlib code using the powerful model\n",
    "response = client.messages.create(model=\"claude-opus-4-1\", max_tokens=4096, messages=messages)\n",
    "\n",
    "generated_response = response.content[0].text\n",
    "print(\"Generated Response:\")\n",
    "print(generated_response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7: Extract response and execute Matplotlib code\n",
    "Finally, let's extract the matplotlib code from the generated response and execute it to visualize the revenue growth trend.\n",
    "\n",
    "We define the ```extract_code_and_response``` function to extract the matplotlib code and non-code response from the generated response. We print the non-code response and execute the matplotlib code if it is found.\n",
    "\n",
    "Note that it is not good practice to use ```exec``` on model-written code outside of a sandbox but for the purposes of this demo we are doing it :)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "# Extract the matplotlib code from the response\n# Function to extract the code and non-code parts from the response\ndef extract_code_and_response(response):\n    start_tag = \"<code>\"\n    end_tag = \"</code>\"\n    start_index = response.find(start_tag)\n    end_index = response.find(end_tag)\n    if start_index != -1 and end_index != -1:\n        code = response[start_index + len(start_tag) : end_index].strip()\n        non_code_response = response[:start_index].strip()\n        return code, non_code_response\n    else:\n        return None, response.strip()\n\n\nmatplotlib_code, non_code_response = extract_code_and_response(generated_response)\n\nprint(non_code_response)\nif matplotlib_code:\n    # Execute the extracted matplotlib code\n    # Note: exec is used here for demonstration purposes to run model-generated visualization code.\n    # In production, use a sandboxed environment for executing untrusted code.\n    exec(matplotlib_code)  # noqa: S102\nelse:\n    print(\"No matplotlib code found in the response.\")"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}